I am performing my investigation on the gapminder dataset related to child mortality, my focus of investigation is on effects of gdp and fertility rate on child mortality. The growth in gdp affects all the stakeholders of a society. I am using the data of Gdp per capita, fertility rate, population, life expectancy and mortality rate. I will investigate the role of gdp as a key deciding factor.
!pip install missingno
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import missingno as msno
import seaborn as sns
%config InlineBackend.figure_format='svg'
# Set the seaborn environment for visualizations
sns.set()
Import the file "gdp_all.csv", this contains gapminder data for Life expectancy, Child mortality, Fertility,GDP, Population of the world from 1964 to 2013. When I downloaded the data from Gapminder, it was in a different format and I had to transpose all the files and then combine them to a single file
df_gdp = pd.read_csv('gdp_all.csv')
df_gdp.head()
# Check the number of rows and columns of the dataframe
df_gdp.shape
# Check the data types for all the columns
df_gdp.info()
# Perform statistical analysis of the numeric columns
df_gdp.describe()
Life expectancy has grown positively from 1964 to 2010,the average rose from 0.2 to 83.4. Child mortality has shown the largest growth from approximately 435 to 2. Life expectancy has gone up from 0.8 to 9.2. These three factors are directly proportional to GDP. It serves as a gauge of our economy's overall size and health. If shows that the people have access to health care and basic necessities of life.
#Take a look at the data distribution
df_gdp.hist(figsize=(10,8));
Child mortality, gdp and population are skewed to the left, mean and the median are in the same bin. whereas life expectancy is skewed to the right.
# Check for the total number of Null data values
df_gdp.isna().sum()
#Visualize the null data values of the numeric columns
msno.bar(df_gdp, figsize = (10,5));
There are only 3 null values in the population column, 901 in child_mortality, 1111 in gdp and 11 in fertility. I will fill the null values of child_mortality, gdp and fertility with mean values. Wherese, I would like to drop the rows of null values for population column.
#Drop the null values from the population column
df_gdp.dropna(axis=0, subset=['population'],inplace=True)
#Check the number of null values in the remaining columns
df_gdp.isna().sum()
# Fill the null data values with mean
df_gdp.fillna(df_gdp.mean(),inplace=True)
#Make sure there are no null data values
msno.bar(df_gdp, figsize = (10,5));
Gdp is my independent variable and child mortality,and life expectancy are my dependent variables. I expect that with the increase in the gdp per capita child mortality and life expectancy goes down and vice versa. Whereas there is an inverse relationship between population growth and gdp per capita.
#Check the distirbution of child mortality data
df_gdp.child_mortality.hist(alpha=0.5)
This graph clearly shows that the number of child births have decreased from 1964 to 2013, this might be attributed to the faact that there are has been considerable advancement in the field of medicine
df_gdp.groupby('year')['gdp'].mean().plot(kind='bar', title='GDP Per Capita Income',alpha=0.7);
plt.xticks(size=7,rotation=90);
plt.yticks(size=9);
GDP has grown from 1964 to 2013, although there had been a few ups an downs but the overall trend is showing an increase in GDP per capita.
sns.heatmap(df_gdp.corr(), square=True, cmap='coolwarm_r');
There is a negative correlation between GDP and Child Mortality.
df1=df_gdp.groupby(['region'])
df1.ngroups
df1.head(1)
#Plotting child mortality in different regions of the world
df_gdp.groupby('region')['child_mortality'].mean().plot(kind='bar', title='Child Mortality',alpha=0.7);
Sub-Saharan Africa has the highest mortality rate, whereas Europe & Central Asia have the smallest mortality rate
#Plotting mean GDP for different region of the world
df_gdp.groupby('region')['gdp'].mean().plot(kind='bar', title='Mean GDP',alpha=0.7);
South Asia has the lowest mean GDP and MIddle East & North Africa has the highest mean GDP.
# Plotting life expectancy agains different regions of the world
df_gdp.groupby('region')['life'].mean().plot(kind='bar', title='Life Expectancy',alpha=0.7);
Mean life expectancy is above 70 years for Europe & Central Asia and over 50 years for Sub-Saharan Africa.
df1.head(1)
x='gdp'
y='life'
s='life'
c='life'
plt.figure(figsize=(10,5))
plt.scatter(x=df_gdp['gdp'],y=df_gdp['child_mortality'],c=df_gdp['life'] ,
s=df_gdp['life'] ,cmap='rainbow', alpha=0.7)
plt.colorbar().set_label('Life Expectancy',fontsize=14)
plt.xlabel('GDP Per Capita')
plt.ylabel('Child Mortality');
This map shows that there is an inverse correlation between GDP and child mortality and life expectancy.
# selecting records in gdp data for east Asia Pacific region and America region
df_EA = df_gdp[(df_gdp.region == 'East Asia & Pacific')| (df_gdp.region == 'America')]
df_EA.head(1)
df_EA.reset_index(inplace=True)
df_EA.head()
df_EA.drop(columns={'index'}, axis=1, inplace=True)
df_EA.head()
df_EA.groupby('region')['child_mortality'].mean().plot(kind='bar', title='Child Mortality',alpha=0.7, color=['pink', 'purple']);
East Asia Pacific has a higher rate of child mortality, whereas America has a little lower rate
df_EA.groupby('region')['life'].mean().plot(kind='bar', title='Life Expectancy',alpha=0.7, color=['blue', 'cyan']);
America has a life expectancy of almost 70 years whereas East Asia PAcific has around 68 years. Thereby Life expectancy shows almost similar trend for East Asia Pacific and America
# Create a dataframe sorted with the highest child mortality rate
df_max = pd.DataFrame(columns=["year","country","fertility","life","population","child_mortality","gdp","region"])
y = []
c = []
f = []
l = []
p = []
cm = []
g = []
r = []
for i in df_gdp.country.unique():
temp = df_gdp.loc[df_gdp.country==i].sort_values('child_mortality',ascending=False).iloc[0,:]
y.append(temp['year'])
c.append(temp['country'])
f.append(temp['fertility'])
l.append(temp['life'])
p.append(temp['population'])
cm.append(temp['child_mortality'])
g.append(temp['gdp'])
r.append(temp['region'])
df_max['year'] = y
df_max['country'] = c
df_max['fertility'] = f
df_max['life'] = l
df_max['population'] = p
df_max['child_mortality'] = cm
df_max['gdp'] = g
df_max['region'] = r
#Top 10 countries with the highest child mortality rate
df_top10=df_max.nlargest(10,'child_mortality')
#reset index
df_top10.reset_index(inplace=True)
#Drop the unwanted index column
df_top10.drop(['index'],axis = 1, inplace=True)
df_top10
#Top 10 countries with the lowest child mortality rate
df_bottom10=df_max.nsmallest(10,'child_mortality')
#reset index
df_bottom10.reset_index(inplace=True)
#Drop the unwanted index column
df_bottom10.drop(['index'],axis = 1, inplace=True)
df_bottom10.head()
#Visualize the data using a horizontal bar chart
df_top10.plot(x='country',y='child_mortality',kind='barh',legend=False,alpha=0.7);
plt.title('Top 10 countries with Highest child mortality',size=17);
plt.xlabel('Child Mortality');
plt.ylabel('Country');
df_bottom10.plot(x='country',y='child_mortality',kind='barh',legend=False, alpha=0.7);
plt.title('Top 10 countries with Lowest child mortality',size=17);
plt.xlabel('Child Mortality');
plt.ylabel('Country');
#Visualize the relationship between the countries with high mortality rate and GDP
df_top10.plot(x='child_mortality',y='gdp',kind='barh',legend=False,alpha=0.7);
plt.title('Highest child mortality Vs GDP',size=17);
plt.xlabel('Child Mortality');
plt.ylabel('GDP');
#Visualize the relationship between the countries with low mortality rate and GDP
df_bottom10.plot(x='child_mortality',y='gdp',kind='barh',legend=False,alpha=0.7);
plt.title('Lowest child mortality Vs GDP',size=17);
plt.xlabel('GDP');
plt.ylabel('Child Mortality');
It shows that the to 10 countries with a lower GDP per capita has a high child mortality rate, whereas the top 10 countries with a low child mortality rate have a high GDP per capita.
x = df_gdp.child_mortality
y = df_gdp.gdp
for i in ['Bangladesh','Pakistan','India']:
plt.plot(df_gdp[df_gdp.country==i]['child_mortality'],df_gdp[df_gdp.country==i]['gdp'])
plt.xlabel('Child Mortality');
plt.ylabel('GDP Per Capita');
plt.title('Child Mortality vs GDP Trend for Balgladesh, Pakistan and India')
plt.legend(['Bangladesh','Pakistan','India'],frameon=True);
plt.show()
India has shown the highest improvement in child mortality,as we can see that the GDP is on the rise as well. Pakistan's economy is also growing and has a positive impact on child mortality rate. Despite the slow growing economy of Bangladesh, they have shown the most improvement in child mortality.
x = df_gdp.child_mortality
y = df_gdp.gdp
for i in ['Bangladesh','Pakistan','India']:
plt.plot(df_gdp[df_gdp.country==i]['year'],df_gdp[df_gdp.country==i]['child_mortality'])
plt.xlabel('Year');
plt.ylabel('Child Mortality');
plt.title('Child Mortality Trend from 1964 to 2013 for Balgladesh, Pakistan and India')
plt.legend(['Bangladesh','Pakistan','India'],frameon=True);
plt.show()
Bangladesh has been the most successful country out of her two neighbours in term of improving the child mortality rate.As we can see in 1964 it was around 250 whereas in 2013 it has gone down to less than 50.
df_gdp[(df_gdp.country == 'Bangladesh')].describe()
Child mortality has gone down for the whole worlfrom 1964 to 2013. One of the contributing factors is the rise in GDP per capita for all the countries. It has shown a better trend for the developed countries of the world as their per capita is higher than the developing and poor nations. The countries in the regions of East Asia & Pacific and America have shown the same trends in GDP, child mortality and life expectancy. When we compared three countries if the souh Asian region, the increase in GDp has shown a positive impact on Bangladesh, India and Pakistan. Whereas, Bangladesh has been more successful in bringing the child mortality to a lower rate as compared to India and Pakistan.
from subprocess import call
call(['python', '-m', 'nbconvert', 'Project#2_Child Mortality _Investigation.ipynb'])